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Supercomputer “Fugaku” 
Formerly known as Post-K 


Supercomputer Fugaku Project 


RIKEN and Fujitsu are currently developing Japan's 

next-generation flagship supercomputer, the successor to 4 
the K computer, as the most advanced general- pues 7 Li j 
supercomputer in the world Z) 


HpPCG @ AWARDS GR 
No.1 (2017) Finalist(2016) No.1 (2018) 
K computer 


PRIMEHPC FX10 ast 


Supercomputer 
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Goals and Approaches for Fugaku FUJITSU 


E Goals 
] NS (v) 
it IE 
High application Good usability Keeping application 
performance and wide range of uses compatibility 


RIKEN announced predicted performance: 


m More than 100x+ faster than K computer for GENESIS and 
NICAM+LETKF 


™ Geometric mean of speedup over K computer in 9 Priority Issues 
is greater than 37x+ 
https://postk-web.1-ccs.riken.jp/perf.html 
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Goals and Approaches for Fugaku 
E Goals 


Me te s$ 


High application Good usability 
performance and wide range of uses 


Pa. 
FUJITSU 


Keeping application 
compatibility 


E Approaches 
Develop Achieve 


- High performance in real applications 
- High efficiency in key features for Al 
applications 


1. High-performance Arm CPU A64FX in HPC and Al areas 
2. Cutting-edge hardware design 
3. System software stack 
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1. High-Performance Arm CPU A64FX in HPC and Al Areas 


Architecture features 


CMG (Core Memory Group) Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) arm 
specification TofuD F A 
13 cores 28 Gbps xallara 10 ports SIMD width 512-bit 
L2 Cache 8 MiB 1/0 a 
Memory 8 GiB, 256 GB/s T PCIe Gen3 16 lanes FP64/32/16, INT64/32/16/8 
4 ° 48 computing cores + 4 assistant cores (4 CMGs) 
HBM2: Peak B/W 1,024 GB/s 
TofuD: 28 Gbps x 2 lanes x 10 ports 


Peak performance (Chip level) 
crops) M Al 


25 4 21.6+ 
20 A64FX (Fugaku) 
E SPARC64 VIIIfx (K computer) 
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gh-Performance Arm CPU A64FX in HPC and Al Areas 


E Architecture features 


Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) arm 
512-bit 

FP64/32/16, INT64/32/16/8 

48 computing cores + 4 assistant cores (4 CMGs) 

HBM2: Peak B/W 1,024 GB/s 

TofuD: 28 Gbps x 2 lanes x 10 ports 


Æ Peak performance (Chip level) 

(TOPS) Al 
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ape | A H ‘i= F 
ai | ns ad = 
phle eii 4 | 
bahili =i] ri f | | l 
$ i 
Core ius | Core | Core |} Core f 4 
-i | += >i * bx | TTE JTE 
j Ea- | | An IAS 
a e jia Ei | Pad a) 
= S E 
H; 


10.8+ 


5.4+ 
2./+ 


0.128 0.128 


i 7 Fu 
a| 
Ili Core 
| 


64 bits 32 bits 16 bits 8 bits 


(Element size) 


5 Copyright 2019 FUJITSU LIMITED 


2. Cutting-edge Hardware Design FUJITSU 
m 1PFlops by Fugaku and K computer 


1x rack including SSDs 80x compute racks & 20x disk racks 
1.1 m? (0.8 m x 1.4 m) 128 m? (4 m x 32 m) 
E Scalable design E CMU (CPU Memory Unit) 


INI TOUT „„ Water 
EE. coupler 


K computer 


m 100% direct water 
cooling 


m 3x QSFP for AOC(Active 
Optical Cables) 


E Single-sided blind 
mate connectors for 


1 2 16 48 384 150k+ electrical signals and 
Performance 27T water 
[Flops] 71+ | 5.41+ | 431+ 129 T+ 1 P+ 400 P+ a 
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3. System Software Stack FUJITSU 


E Fujitsu developing system software in collaboration with RIKEN 


m Fujitsu Technical Computing Suite implementing development and execution environments 
with great usability on large-scale system 


Fugaku applications 


Fujitsu Technical Computing Suite / RIKEN developing system software 


anagement software File system Programming environmen 


System management FEFS | XcalablemP | 
for high availability & power Lustre-based distributed file 


MPI 
(Open MPI, ) 


Compilers 


(C, C++, Fortran) 


saving operation system OpenMP, 
Job management for higher Coarray 
system utilization & power LLIO Debuggi d 
y A P NVM-based file 1/0 accelerator EDugging on 
efficiency tuning tools 


Linux OS / McKernel (Lightweight kernel) 


Fugaku system hardware 
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3. System Software Stack FUJITSU 


E Fujitsu developing system software in collaboration with RIKEN 


m Fujitsu Technical Computing Suite implementing development and execution environments 
with great usability on large-scale system 


Fugaku applications 
Fujitsu Technical Computing Suite / RIKEN developing system software 
Programming environment 


MPI 
(Open MPI, ) 
OpenMP, Compilers 
Coarray (C, G++, Fortran) 
Debugging and Math. lib 
tuning tools Sa 
Linux OS / McKernel (Lightweight kernel) 
Fugaku system hardware 


m Exploits hardware performance by compiler 
optimizations such as SVE vectorization 


m Supports new programming language 
Standards and data type FP16 
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High Performance in Real Application 


m= WRF: Weather Research and Forecasting model (v3.8.1) 
E Vectorizing loops including IF statements is key optimization 

E Himeno Benchmark (Fortran90, size: XL) 
E Stencil calculation to solve Poisson's equation by Jacobi method 


WRF v3.8.1 (48-hour, 12km, CONUS) on 48 cores 


xK 
o 
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3 1.56 
œ 
o 1.5 
= 1.00 
igs] 
= 1 
£ 
5 0.5 
a 
0 
Skylake A64FX 
(Xeon Platinum 8168) 1 CPU 
CPUs with source tuning 


* Normalized by the average elapsed time for timestep of Skylake 
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High memory B/W and 


long SIMD length work 
effectively 


Himeno Benchmark (Fortran90) 


346 a 305 
85 103 l l f 
g E 


Skylake(Xeon FX100 A64FX SX-Aurora Tesla V100 
Platinum 8168) 1CPU 1CPU TSUBASA 1GPUT 
2CPUs 1VEt 


TPerformance evaluation of a vector supercomputer SX-aurora TSUBASA 
https://dl.acm.org/citation.cfm?id=3291728 
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High Efficiency in Key Features for Al Applications FUJITSU 


@ High FP16 & INT8 peak performance and high memory peak B/W 


INT8 partial dot product 
C= & (Aix Bi)+C 


FP16 performance: | 0.8+ TOPS, > 90%@HGEMM eee 


A2 A3 


INT8 performance : 21.6+ TOPS in partial dot product l j 8283 


Memory BW: 1,024 GB/s, > 80%@STREAM Triad C 


Ab4FX CPU 


Compilers & libraries 


32-bit 


E Functions contributing to key features in Al fields 


e 2x 512-bit wide SIMD pipelines per core for FP16 and INT8 
e High memory B/W and calculation throughput 


e Vectorization and software pipelining 
e FP16 as data type of programming language (e.g., real (kind=2) in Fortran) 
e Mathematical Library for HGEMM 
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Future Plans 


Supercomputer Fugaku 
Æ Operations starting around CY2021 


C¥2011| 2012 


2013 | 2014 au 2016 | 2017 | 2018 2ds | 2020 | 2021 
ake SCE : —— 


Development Operations 


© RIKEN 
K computer 


Fujitsu HPC Products 


@ Fujitsu will begin global sales of supercomputers 
based on the Supercomputer Fugaku technology in 
the 2nd half of FY2019 
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